MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels

نویسندگان

  • Tobias Marschall
  • Iman Hajirasouliha
  • Alexander Schönhuth
چکیده

MOTIVATION Accurately predicting and genotyping indels longer than 30 bp has remained a central challenge in next-generation sequencing (NGS) studies. While indels of up to 30 bp are reliably processed by standard read aligners and the Genome Analysis Toolkit (GATK), longer indels have still resisted proper treatment. Also, discovering and genotyping longer indels has become particularly relevant owing to the increasing attention in globally concerted projects. RESULTS We present MATE-CLEVER (Mendelian-inheritance-AtTEntive CLique-Enumerating Variant findER) as an approach that accurately discovers and genotypes indels longer than 30 bp from contemporary NGS reads with a special focus on family data. For enhanced quality of indel calls in family trios or quartets, MATE-CLEVER integrates statistics that reflect the laws of Mendelian inheritance. MATE-CLEVER's performance rates for indels longer than 30 bp are on a par with those of the GATK for indels shorter than 30 bp, achieving up to 90% precision overall, with >80% of calls correctly typed. In predicting de novo indels longer than 30 bp in family contexts, MATE-CLEVER even raises the standards of the GATK. MATE-CLEVER achieves precision and recall of ∼63% on indels of 30 bp and longer versus 55% in both categories for the GATK on indels of 10-29 bp. A special version of MATE-CLEVER has contributed to indel discovery, in particular for indels of 30-100 bp, the 'NGS twilight zone of indels', in the Genome of the Netherlands Project. AVAILABILITY AND IMPLEMENTATION http://clever-sv.googlecode.com/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detailed MATE-CLEVER Pipeline for GoNL

For deletion discovery, we ran the discovery part of MATE-CLEVER [3], with minor modifications that account for volatilities among library protocols. MATE-CLEVER is an integrated approach. Its major purpose in the frame of the project is to discover deletions of size 30–100 bp (sometimes termed the ”twilight zone of NGS indels”). It incorporates CLEVER [2], as an internal segment size based app...

متن کامل

Using Mendelian inheritance to improve high-throughput SNP discovery.

Restriction site-associated DNA sequencing or genotyping-by-sequencing (GBS) approaches allow for rapid and cost-effective discovery and genotyping of thousands of single-nucleotide polymorphisms (SNPs) in multiple individuals. However, rigorous quality control practices are needed to avoid high levels of error and bias with these reduced representation methods. We developed a formal statistica...

متن کامل

Inheritance of the fertility restoration and genotyping of rice lines at the restoring fertility (Rf) loci using molecular markers

The combination of cytoplasmic male sterility (CMS) in one parent and a restorer gene (Rf) to restore fertility in another are indispensable for the development of hybrid varieties. To genotype rice lines at the restoring fertility (Rf) loci, 38 lines were crossed with a sterile tester (rfrf) line. Pollen fertility test was performed to identify sterile and fertile F1 hybrids. Seven lines were ...

متن کامل

Discovery of Mate Selection Attitudes of Single Girls: A Qualitative Study

Purpose: Mate selection and having a happy and successful marriage is one of the most important issues in the lives of many. One of the issues that affects the success rate of mate selection is the attitude toward this choice. The purpose of this study was to explore the attitudes of mate selection of single Iranian girls. Methods: This study was carried out with qualitative method, using thema...

متن کامل

Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.

We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 29 24  شماره 

صفحات  -

تاریخ انتشار 2013